Super-resolution of X-ray CT images of rock samples by sparse representation: applications to the complex texture of serpentinite

X-ray computed tomography (X-ray CT) has been widely used in the earth sciences, as it is non-destructive method for providing us the three-dimensional structures of rocks and sediments. Rock samples essentially possess various-scale structures, including millimeters to centimeter scales of layering and veins to micron-meter-scale mineral grains and porosities. As the limitations of the X-ray CT scanner, sample size and scanning time, it is not easy to extract information on multi-scale structures, even when hundreds meter scale core samples were obtained during drilling projects. As the first step to overcome such barriers on scale-resolution problems, we applied the super-resolution technique by sparse representation and dictionary-learning to X-ray CT images of rock core sample. By applications to serpentinized peridotite, which records the multi-stage water–rock interactions, we reveal that both grain-shapes, veins and background heterogeneities of high-resolution images can be reconstructed through super-resolution. We also show that the potential effectiveness of sparse super-resolution for feature extraction of complicated rock textures.

Here, we describe the detailed formulation of the proposed sparse super-resolution for rock CT images. A small area of images (i.e., a patch) y y y i is assumed to be expressed by basis images with a sparse vector. Corresponding high-and low-resolution images are assumed to have a common sparse vector as follows: where dictionaries for high-resolution and low resolution images are expressed by The sparse super-resolution comprises two steps: dictionary learning using high-resolution images, and reconstruction of the high-resolution image by applying sparse coefficients obtained from the low-resolution image to high-resolution image representation ( Fig. 1 in main article). In both steps, the L 1 norm is used to realize sparse representation. The size of the image patches is adjusted by try and error.
In the dictionary learning step, we obtain a dictionary with basis images D D D high = , · · · , y y y high P DL } (P DL : the total number of patch images used for dictionary learning, which are obtained from high-resolution images). Here, we simultaneously optimize a high-resolution dictionary D D D high and a matrix with sparse vectors X X X as follows: ( where the first term represents the discrepancies between high-resolution patch images Y Y Y high and the corresponding reconstructed images D D D high X X X, and the second term is an L 1 regularization term for the sparsity condition. Here, λ is a regularization parameter that controls the sparsity. Optimization is performed iteratevely in Eq. (S3) for one of two factors ( D D D high , X X X ) as follows: where X X X * and D D D high * are the tentatively estimated sparse vector and dictionary, respectively. Note that the second term in Eq. (S4) is a constraint for sparsity, whereas the inequality in Eq. (S5) removes the scaling ambiguity. Minimizations in Eqs. (S4) and (S5) are conducted by the least absolute shrinkage and selection operator method and quadratically constrained quadratic problem, respectively. The low-resolution dictionary D D D low est is derived from the obtained high-resolution dictionary D D D high est by using a downsampling matrix L L L as D D D low est = L L LD D D high est . In the super-resolution step, we first estimate a sparse vector that can reconstruct low-resolution patch imagesỸ Y Y low = {ỹ y y low 1 ,ỹ y y low 2 , · · · ,ỹ y y low P SR } (P SR : the total number of patch images for super-resolution) in terms of a small number of basis images.
For appropriate reconstruction, a matrix with sparse vectors,X X X, is optimized by minimizing the following expression: In each patch, the following optimization is individually conducted: Basically, the high-resolution patch imageỹ y y i,est can be reconstructed by using the high-resolution dictionary D D D high est and the estimated sparse vectorx x x i,est as follows: However, the above individual optimization may induce unnatural boundaries between neighboring patches. To avoid this effect, we considered optimization with an overlap between the current target patch and previously reconstructed patches as follows: x x x i,est = arg miñ where P P P i, j extracts the region of overlap between the current target patch i and a previously reconstructed high-resolution patch image. Here, N(i) is a set of patches neighboring the current patch i whose high-resolution patch images have already been reconstructed. β is a positive constant. By assuming that a weight matrixX X X is common between high-resolution and low-resolution images, high-resolution patch imagesỸ Y Y high 0 can be reconstructed from the high-resolution dictionary and the estimated weight matrixX X X est as follows: (S10) The high resolution patch imagesỸ Y Y high 0 may not satisfy the constraint between high-resolution and low-resolution images Y Y Y low = L L LY Y Y high . To satisfy the constraint, we consider the following update for high-resolution images: where c is a positive constant. The first and second terms in Eq. (S11) respectively correspond to the constraint on the relation between high and low resolution images for reconstruction and the constraint on reconsructed high-resolution image by means of sparse modeling. Eq. (S11) is optimized by the steepest decent. The iterative equation for the high-resolution image is expressed as follows: is the obtained high-resolution image at iteration k. Here γ is a positive constant. We regardỸ Y Y high k at sufficiently large k as estimate high-resolution imageỸ Y Y high * .

Image quality indices
We considered two kinds of image quality indices for comparing two images I = {i x,y } and J = { j x,y } (x, y: pixel number in two-dimensional space). If one of the two images is the true high-resolution image and the other is the estimated image, the following indices can be used to quantify the performance of super-resolution method.

2/3
The signal-to-peak noise ratio (PSNR) is defined by using the mean-squared error: where MSE is the mean-squared error between images I and J as follows: Here, n is the total number of pixels. MAX expresses the maximum pixel value that an image format can take. As expressed in Eq. (S14), the PSNR uses discrepancy between images in the pixel level. The structural similarity index measure (SSIM) is defined by the average and variance of each image as follows: where µ x and µ y are the averaged pixel values of images I and J, respectively. σ i and σ j are the variances of images I and J, respectively, while σ i j is the covariance of images I and J. Here, c 1 and c 2 are constants. As expressed in Eq. (S15), the SSIM evaluates the discrepancy between images using statistics for the image structure rather than discrepancy between images in the pixel level.